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Abstract 

To support and guide an extensive experimental research into sys- 
tems biology of signaling pathways, increasingly more mechanistic mod- 
els are being developed with hopes of gaining further insight into bio- 
logical processes. In order to analyse these models, computational and 
statistical techniques are needed to estimate the unknown kinetic pa- 
rameters. This chapter reviews methods from frequentist and Bayesian 
statistics for estimation of parameters and for choosing which model 
is best for modeling the underlying system. Approximate Bayesian 
Computation (ABC) techniques are introduced and employed to ex- 
plore different hypothesis about the JAK-STAT signaling pathway. 

1 Introduction 

It is crucial for cells to be able to sense the environment and react to 
changes in it. This is done through signaling pathways, which involve 
complex networks of often non-linear interactions between molecules. 
These interactions transduce a signal from outside the cell to trigger 
a functional change within a cell. Signaling pathways are important 
for differentiation, survival and adaptation to varying external con- 
ditions. The dynamics of these pathways are currently a subject of 
extensive experimental and computational research [1-5]; and some of 
the most important biological signaling pathways have received con- 
siderable attention from mathematical modellers and theoretical sys- 
tems biologists. These signaling networks include MAPK and Ras- 
Raf-ERK [6-14], JAK-STAT [15-17], GPCR [18], NF-kB [4,19]. It 
has become abundantly clear, however, that these signaling pathways 
can not be separated from one another, but that they interact; this 
phenomenon is known as crosstalk [20]. 

A series of modeling approaches have been applied to the study of 
signaling pathways. Most widely used are ordinary differential equa- 
tion (ODE) models that follow mass action kinetics. Also Boolean and 
Bayesian networks and Petri nets have been employed for modeling and 
simulation. The rich formalism underlying these different approaches 
has provided us with efficient tools for the analysis of signaling models. 



Computational approaches have been mainly used for simulation and 
for studying qualitative properties of signaling pathways such as motifs 
and feedback loops [21,22], and quantitative properties such as signal 
duration, signal amplitude and amplification [23-25]. 

To develop and utilize detailed quantitative signaling models we re- 
quire the values of all the parameters, such as kinetic rates of protein 
turnover and post-translational modifications (e.g. phosphorylation or 
dimcrization) . Due to technological restrictions and cost it is impos- 
sible to measure all the parameters experimentally. In this chapter 
we review computational tools that can be used for parameter infer- 
ence for ODE models. While many studies have dealt with the subject 
of parameter estimation, relatively little attention has been given to 
model selection; that is, which model(s) to use for inference. Despite 
this, "What is the best model to use?" is probably the most critical 
question for making valid inference from the data [26], and this is the 
second topic that we touch on in this chapter. 

There are two broad schools of thought in statistical inference: fre- 
quentist and Bayesian. In frequentist statistics one talks about point 
estimates and confidence intervals around them. The likelihood func- 
tion is a central concept in statistical inference, and is used in both 
frequentist and Bayesian settings. It equals the probability of the data 
given the parameters, and it is a function of parameters, 

L(9) = P(D\9). 

The cannonical way of obtaining the point estimate is by taking a 
maximum likelihood estimate; i.e. the set of parameters for which 
the probability of observing the data is highest. On the other hand, 
Bayesian statistics is based on probability distributions. Here one aims 
to obtain the posterior probability distribution over the model param- 
eters, which is proportional to the product of a suitable prior distri- 
bution (which summarizes the user's prior knowledge or expectations) 
and the likelihood (the information that is obtained from the data). 

In the following sections we will review how frequentist and Bayesian 
statistics can be used to estimate parameters of ODE models of signal- 
ing pathways, and how to choose which model has the highest support 
from the data. We then outline an approximate Bayesian computa- 
tion algorithm based on Sequential Monte Carlo and apply it to the 
JAK-STAT signaling pathway where we will illustrate aspects related 
to parameter estimation and model selection. 

2 Parameter inference 

Signaling pathway models include numerous parameters, and it is gen- 
erally impossible to obtain all of these values by experimental measure- 
ments alone. Therefore parameter inference (also referred to as model 
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calibration, model fitting or parameter estimation by different authors) 
algorithms can be used to estimate these parameter values computa- 
tionally. A variety of different approaches have been developed and 
are being used used; they all share the two main ingredients: a cost 
function, which reflects and penalizes the distance between the model 
and experimental data, and an optimization algorithm, which searches 
for parameters that optimize the cost function. The most commonly 
used cost functions in a frequentist approach are the likelihood (one 
wants to maximize it) and the least squares error (one wants to min- 
imize it). The Bayesian equivalent to a cost function is the Bayesian 
posterior distribution. 

There are many different kinds of optimization algorithms. Their goal 
is to explore the landscape defined by cost function and find the op- 
timum (i.e. minimum or maximum, depending on the type of cost 
function used). The simplest are the local gradient descent methods 
(e.g. Newton's method, Levenberg-Marquardt). These methods are 
computationally fast, but are only able to find local optima. When 
the cost function landscape is complex, which is often the case for sig- 
naling models with high dimensional parameter space, these methods 
are unlikely to find the global optimum, and in this case more sophisti- 
cated methods need to be used. Multiple shooting [27] performs better 
in terms of avoiding getting stuck in local optima, but, as argued by 
Brewer et al. [28] may perform poorly when measurements are sparse 
and data are noisy. A large class of optimization methods are global 
optimization methods that try to explore complex surfaces as widely as 
possible; among these genetic algorithms are particularly well known 
and have been applied to ODE models [25]. Moles et al. [29] tested 
several global optimization algorithms on a 36-parameter biochemi- 
cal pathway model and showed that the best performing algorithm 
was a stochastic ranking evolutionary strategy [30] (software avail- 
able [31,32]). Further improvements in computational efficiency of 
this algorithm were obtained by hybrid algorithms incorporating local 
gradient search and multiple shooting methods [17,33]. 

To obtain an ensemble of good parameter values, an approach based 
on simulated annealing [34] and Monte Carlo search though parameter 
space can be used [35, 36] . In a Bayesian setting, MCMC methods [37] 
(software available [38]) and unscented Kalman filtering [39] have been 
applied to estimate the posterior distribution of parameters. Bayesian 
methods do not only estimate confidence intervals, but provide even 
more information by estimating of the whole posterior parameter dis- 
tribution. To obtain confidence intervals for a point estimate in a 
frequentist setting, a range of techniques can be applied that include 
variance-covariance matrix based techniques [40], profile likelihood [41] 
and bootstrap methods [42]. 

Parameter estimation should be accompanied by idcntifiability and 
sensitivity analyses. If a parameter is non-identifiable, this means it 
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is difficult or impossible to estimate due to either model structure 
(structural non-identifiability) or insufficient amount or quality or data 
measurements (statistical non-identifiability) [19,43,44]. Structurally 
non-identifiable parameters should ideally be removed from the model. 
Sensitivity analysis studies how model output behaves when varying 
parameters [45]. If model output changes a lot when parameters are 
varied slightly, we say that the model is sensitive to changes in certain 
parameter combinations. Recently, the related concept of sloppiness 
has been introduced by Sethna and co-workers [35,46]. They call a 
model sloppy when the parameter sensitivity eigenvalues are roughly 
evenly distributed over many decades; those parameter combinations 
with large eigenvalues are called sloppy and those with low eigenval- 
ues stiff. Sloppy parameters are hard to infer and carry very little 
discriminatory information about the model. The concepts of identi- 
fiability, sloppiness and parameter sensitivity are, of course, related: 
non-identifiable parameters and sloppy parameters are hard to esti- 
mate precisely because they can be varied a lot without having a large 
effect on model outputs; the corresponding parameter estimates will 
thus have large variances. A parameter with large variance can, in a 
sensitivity context, be interpreted as one to which the model is not 
sensitive if the parameter changes. 

3 Model selection 

Model selection methods strive to rank the candidate models, which 
represent hypothesis about the underlying system, relative to each 
other according to how well they explain the experimental data. Cru- 
cially, the chosen model is not the "true" model, but the best model 
from the set of candidate models. It is the one which we should prob- 
ably use for making inferences from the data. Generally, the more 
parameters are included in the model, the better a fit to the data can 
be achieved. If the number of parameters equals the number of data 
points, there is always a way of setting the parameters so that the fit 
will be perfect. This is called overfitting. Wcl [47] famously addressed 
a question of "how many parameters it takes to fit an elephant" , which 
practically suggests that if one takes a sufficiently large number of pa- 
rameters, a good fit can always be achieved. The other extreme is 
underfitting, which results from using too few parameters or too in- 
flexible a model. A good model selection algorithm should follow the 
principle of parsimony, also referred to as Occam's razor, which aims 
for to determine the model with the smallest possible number of pa- 
rameters that adequately represents the data and what is known about 
the system under consideration. 

The probably best known method for model selection is (frequentist) 
hypothesis testing. If ODE models are nested (i.e. one model can be 
obtained from the other by setting some parameter values to zero), 
then model selection is generally performed using the likelihood ratio 
test [16,48]. If both models have the same number of parameters and 
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if there is no underlying biological reason to choose one model over the 
other, then we choose the one which has a higher maximum likelihood. 
However, if the parameter numbers differ, then the likelihood ratio test 
penalizes overparameterization. 

If the models are not nested, then model selection becomes more diffi- 
cult but a variety of approaches have been developed that can be ap- 
plied in such (increasingly more common) situations. Bootstrap meth- 
ods [42,48] are based on drawing many so-called bootstrap samples 
from the original data by sampling with replacement, and calculating 
the statistic of interest (e.g. an achieved significance level of a hypoth- 
esis test) for all of these samples. This distribution is thin compared 
to the real data. 

Other model selection methods applicable to non-nested models are 
based on information-theoretic criteria [26] such as the Akaikc Infor- 
mation Criteria (AIC) [16,48-50]. These methods involve a goodness- 
of-fit term and a term measuring the paramctcric complexity of the 
model. The purpose of this complexity term is to penalize models with 
high number of parameters; the criteria by which this term should be 
chosen can differ considerably among the various proposed measures. 

In a Bayesian setting model selection is done through so-called Bayes 
factors (for comprehensive review see [51]). We consider two models, 
mi and ui2 and would like to determine which model explains the 
data x better. The Bayes factor measuring the support of model mi 
compared to model m 2 , is given by 

_ p{x\mi) _ J p(x\m 1 ,0i)p(Q 1 \mi)d9 1 
12 p(x\m 2 ) J p(x\m 2l 02)p(0 2 \m2)d92' 

To compute it, marginal likelihoods have to be computed, and this is 
done by integrating non-linear functions over all possible parameter 
combinations. This can be a challenging problem when the dimension 
of the parameter space is high, and Vyshcmirsky et al. asses various 
methods how this can be done efficiently [37]. A Bayesian version of 
information-theoretic model selection techniques introduced above is 
the Bayesian Information Criterion (BIC) [35,52], which is an approx- 
imation of the logarithm of the Bayes factor. Unlike the AIC, which 
tends towards overly complex models as the data saturates, the BIC 
chooses correct models in the limit of infinite data availability. 

There are several advantages of Bayesian model selection compared 
to traditional hypothesis testing. Firstly, the models being compared 
do not need to be nested. Secondly, Bayes factors do not only weigh 
the evidence against a hypothesis (in our case m 2 ), but can equally well 
provide evidence in favour of it. This is not the case for traditional 
hypothesis testing where a small p-value only indicates that the null 
model has insufficient explanatory power. However, one cannot con- 
clude from a large p- value that the two models are equivalent, or that 
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the null model is superior, but only that there is not enough evidence 
to distinguish between them. In other words, "failing to reject" the 
null hypothesis cannot be translated to "accepting" the null hypoth- 
esis [51,53]. Thirdly, unlike the posterior probability of the model, 
the p-valuc docs not provide any direct interpretation of the weight 
of evidence (the p-valuc is not the probability that the null hypoth- 
esis is true). We expect that Bayesian methods will also deal better 
with so-called sloppy parameters because they are based on explicit 
marginalization over model parameters. 

4 Approximate Bayesian computation tech- 
niques 

When formulating the likelihood for an ODE model, one normally 
assumes the Gaussian error distribution on the data points: by def- 
inition this is the only way of defining a likelihood for a determinis- 
tic model. Moreover, it might be hard to analytically work with the 
likelihood (e.g. finding maximum likelihood estimate and integrat- 
ing the marginal probabilities). Approximate Bayesian computation 
(ABC) methods have been conceived with the aim of inferring poste- 
rior distributions by circumventing the use of the likelihood, in favour 
of exploiting the computational efficiency of modern simulation tech- 
niques by replacing calculation of the likelihood with a comparison 
between the observed data and simulated data. These approaches are 
also straightforwardly applied to ODE model of signaling networks. 

Let 9 be a parameter vector to be estimated. Given the prior dis- 
tribution p(0), the goal is to approximate the posterior distribution, 
p(0\x) cx f(x\9)p(9), where f(x\6) is the likelihood of given the data 
x. ABC methods have the following generic form: 

1 Sample a candidate parameter vector 9* from some proposal distri- 

bution p{9). 

2 Simulate a data set x* from the model described by a conditional 

probability distribution f(x\9*). 

3 Compare the simulated data set, x* , to the experimental data, xo, 

using a distance function, d, and tolerance e; if d{x 0l x*) < e, 
accept 9* . The tolerance e > is the desired level of agreement 
between x and x* . 

The output of an ABC algorithm is a sample of parameters from a 
distribution p(9\d(x n , x*) < e). If e is sufficiently small then the distri- 
bution p(9\d(x , x*) < e) will be a good approximation for the "true" 
posterior distribution, p(9\x ). 

The most basic ABC algorithm outlined above is known as the ABC 
rejection algorithm; however, recently more sophisticated and com- 
putationally efficient ABC methods have been developed. They are 
based on Markov chain Monte Carlo (ABC MCMC) [54] and Sequential 
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Monte Carlo (ABC SMC) techniques [55-58]. They have recently been 
applied to dynamical systems modeled by ordinary differential equa- 
tions and stochastic master equations; ABC SMC has been developed 
for approximating the posterior distribution of the model parameters 
and for model selection using Bayes factors [56]. In the next section 
we illustrate the use of ABC SMC for parameter estimation and model 
selection in the context of the JAK-STAT signaling pathway. 

5 Application to JAK-STAT signaling path- 
way 

The JAK-STAT signalling pathway is involved in signalling through 
several surface receptors and STAT proteins, which act as signal trans- 
ducers and activators of transcription [59,60]. Here we look at models 
of signalling though the erythropoietin receptor (EpoR) , transduced by 
STAT5 (Figure[l]). Signalling through this receptor is crucial for prolif- 
eration, differentiation, and survival of erythroid progenitor cells [61]. 




Figure 1: STAT5 signalling pathway. 

When the Epo hormone binds to the EpoR receptor, the receptor's cy- 
toplasmic domain becomes phosporylated, which creates a docking site 
for signalling molecules, in particular the transcription factor STAT5. 
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Upon binding to the activated receptor, STAT5 first becomes phos- 
phorylated, then dimerizes and translocates to the nucleus, where it 
acts as a transcription factor. There have been competing hypotheses 
about what happens with STAT proteins at the end of the signalling 
pathway. Originally it had been suggested that STAT proteins get de- 
graded in the nucleus in an ubiquitin-asssociated way [62], while other 
evidence suggests that they are dephosphorylated in the nucleus and 
then transported back to the cytoplasm [63]. 

Here we want to understand how STAT5 protein transduces the signal 
from the receptor in the membrane through the cytoplasm into the 
nucleus. We have approached this problem by applying ABC SMC 
for model selection and parameter estimation to data collected for the 
JAK-STAT signaling pathway. The most suitable model from model 
of a STAT5 part of the JAK-STAT signaling pathway among the three 
proposed models was chosen and parameters have been estimated. 

The ambiguity about the shutoff mechanism of STAT5 in the nucleus 
triggered the development of several mathematical models [16,48,64], 
each describing a different hypothesis. These models were then fitted 
to experimental data and systematically compared to each other using 
statistical methods of model selection. The model selection procedure 
ruled in favour of a cycling model, where STAT5 reenters the cyto- 
plasm. 

Timmcr et al. [16,48,64] developed a continuous mathematical model 
for STAT5 signalling pathyway, comprising of four differential equa- 
tions. They assume mass action kinetics and denote the amount of acti- 
vated Epo-receptors by EpoR,A, monomeric unphosphorylated STATS 
molecules by x\, monomeric phosphorylated STAT5 molecules by X2, 
dimeric phosphorylated STAT5 in the cytoplasm by x 3 and dimeric 
phosphorylated STAT5 in the nucleus by X4. The most basic model 
Timmcr et al. developed, under the assumption that phosphorylated 
STAT5 does not leave the nucleus, consists of the following kinetic 
equations: 

x\ = —kiXiEpoRA (1) 

X2 = —k,2x\ + kiXiEpoR-A 

x 3 = -k 3 x 3 + ^k 2 xj 

± 4 = k 3 x 3 . (2) 

One can then assume that phosphorylated STAT5 de-dimerizes and 
leaves the nucleus and this can be modelled by adding appropriate 
kinetic terms to the equations and ^ of the basic model: 

x% = —kiXiEpoRA + 2/C4X4 
X4 = k 3 x 3 — £4X4. 

Timmcr et al. develop their cycling model further by assuming a delay 
in moving of STAT5 out of the nucleus. They write ODE equations 
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for x\ and X4 for this model as 

±1 = —kiXiEpoRA + 2k^x?,(t — t) (3) 
x 4 = k 3 x 3 - k 4 x 3 (t - r), (4) 

while equations for x 2 and x 3 remain as above. The outcome of their 
statistical analysis is that this model fits the data best, which leads 
them to the conclusion that this is the most appropriate model. 

Instead of Timmer's chosen model, we propose a similar model with 
clear physical interpretation. Instead with x 3 (t — r), we propose to 
model the delay of phosphorylated STAT5 x 4 in the nucleus with 
Xi(t — t): 

±1 = — kiXiEpoRA + 2k4X4(t — r) 
±4 = k 3 x 3 — k^x^t — t). 

We have performed the ABC SMC model selection algorithm [56] on 
the following models: (1) Cycling delay model with x 3 (t — r), (2) Cy- 
cling delay model with x 4 (t — r), (3) Cycling model without a delay. 
The model parameter m can therefore take values 1, 2 and 3. 

Figure [2] shows intermediate populations leading to the approximation 
of the marginal posterior distribution of m (population 15). Bayes fac- 
tors can be calculated from the last population and according to the 
conventional interpretation of Bayes factors [51], it can be concluded 
that there is very strong evidence in favour of models 2 and 3 com- 
pared to model 1. However, there is only moderate evidence for model 
3 being more suitable than model 2. 



6 Discussion 

Modeling biological signaling or regulatory systems requires reliable 
parameter estimates. But the experimental dissection of signaling path- 
ways is costly and laborious; it furthermore seems unreasonable to 
believe that the same set of parameters describes a system across all 
possible environmental, physiological and developmental conditions. 
We are therefore reliant on efficient and reliable statistical and compu- 
tational methods in order to estimate parameters and, more generally, 
reverse engineer mechanistic models. 

As we have argued above, any such estimate must include a meaningful 
measure of uncertainty. A rational approach to modeling such systems 
should furthermore allow for the comparison of competing models in 
light of available data. The relative new ABC approaches are able to 
meet both objectives. Furthermore as we have shown elsewhere they 
are not limited to deterministic modelling approaches but arc also read- 
ily applied to explicitly stochastic dynamics; in fact it is possible to 
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Figure 2: Histograms show populations of the model parameter m. Population 
13 represents the approximation of the marginal posterior distribution of m. The 
red sections present 25% and 75% quantiles around the median. 



compare the explanatory power of deterministic and stochastic dynam- 
ics in the same mechanistic model. 

One of the principal reasons for applying sound inferential procedures 
in the context of dynamical systems is to get a realistic appraisal of 
the robustness of these systems. If, as has been claimed, only a small 
set of parameters determines the system outputs then we have to as- 
certain these with certainty. It is here, in the reverse engineering of 
potentially sloppy dynamical systems, where the Bayesian perspective 
may be most beneficial. 



10 



References 



[1] Klipp E and Liebermeister W. Mathematical modeling of intracellular 
signaling pathways. BMC Neurosci, 7 Suppl 1:S10, 2006. 

[2] Neves S and Iyengar R. Modeling of signaling networks. BioEssays, 
24:1110 - 1117, 2002. 

[3] Levchenko A. Dynamical and integrative cell signaling: challenges for 
the new biology. Biotechnol Bioeng, 84(7):773-82, 2003. 

[4] Cho K and Wolkenhauer O. Analysis and modelling of signal transduc- 
tion pathways in systems biology. Biochemical Society Transactions, 
31:1503-1509, 2003. 

[5] Papin J, Hunter T, Palsson B and Subramaniam S. Reconstruction of 
cellular signalling networks and analysis of their properties. Nat Rev 
Mol Cell Biol, 6(2):99-lll, 2005. 

[6] Fujioka A, Terai K, Itoh RE, Aoki K, Nakamura T, Kuroda S, Nishida 
E and Matsuda M. Dynamics of the Ras/ERK MAPK cascade as 
monitored by fluorescent probes. J Biol Chem, 281(13):8917-26, 2006. 

[7] Apgar JF, Toettcher JE, Endy D, White FM and Tidor B. Stimulus 
design for model selection and validation in cell signaling. PLoS Comput 
Biol, 4(2):e30, 2008. 

[8] Markevich NI, Hoek JB and Kholodenko BN. Signaling switches and 
bistability arising from multisite phosphorylation in protein kinase cas- 
cades. J Cell Biol, 164(3) :353-9, 2004. 

[9] Babu C, Yoon S, Nam H and Yoo Y. Simulation and sensitivity analysis 
of phosphorylation of EGFR signal transduction pathway in PC12 cell 
model. Systems biology, 213-221, 2004. 

[10] Babu CS, Song EJ and Yoo Y. Modeling and simulation in signal 
transduction pathways: a systems biology approach. Biochimie, 88:277- 
283, 2006. 

[11] Conzelmann H, Saez-Rodriguez J and Sauter T. Reduction of mathe- 
matical models of signal transduction networks: simulation-based ap- 
proach applied to EGF receptor signalling. Systems biology, 1:159-169, 
2004. 

[12] Kolch W, Calder M and Gilbert D. When kinases meet mathematics: 
the systems biology of MAPK signalling. FEBS Lett, 579:1891-1895, 
2005. 

[13] Andrec M, Kholodenko B, Levy R and Sontag E. Inference of signaling 
and gene regulatory networks by steady-state perturbation experiments: 
structure and accuracy. J Theor Biol, 232:427-441, 2005. 

[14] Schoeberl B, Eichler-Jonsson C, Gilles E and Miiller G. Computational 
modeling of the dynamics of the MAP kinase cascade activated by sur- 
face and internalized EGF receptors. Nat Biotechnol, 20(4):370-5, 2002. 

[15] Aaronson D and Horvath C. A road map for those who don't know 
JAK-STAT. Science, 296(5573) :1653, 2002. 

[16] Swameye I, Muller TG, Timmer J, Sandra O and Klingmuller U. Iden- 
tification of nucleocytoplasmic cycling as a remote sensor in cellular sig- 
naling by databased modeling. Proc Natl Acad Sci USA, 100(3): 1028- 
33, 2003. 



11 



[17] Balsa-Canto E, Pcifcr M, Banga JR, Timmer J and Fleck C. Hybrid 
optimization method with general switching strategy for parameter es- 
timation. BMC Systems Biology, 2:26, 2008. 

[18] Modchang C, Triampo W and Lenbury Y. Mathematical modeling 
and application of genetic algorithm to parameter estimation in signal 
transduction: trafficking and promiscuous coupling of G-protein cou- 
pled receptors. Comput Biol Med, 38(5):574-82, 2008. 

[19] Yuc H, Brown M, Knowles J, Wang H, Broomhead DS and Kell DB. 
Insights into the behaviour of systems biology models from dynamic 
sensitivity and identifiability analysis: a case study of an NF-kappaB 
signalling pathway. Mol Biosyst, 2(12):640-9, 2006. 

[20] Schwartz MA and Baron V. Interactions between mitogenic stimuli, or, 
a thousand and one connections. Curr Opin Cell Biol, ll(2):197-202, 
1999. 

[21] Tyson J, Chen K and Novak B. Sniffers, buzzers, toggles and blinkers: 
dynamics of regulatory and signaling pathways in the cell. Curr Opvn 
Cell Biol, 15(2):221-31, 2003. 

[22] Bhalla US and Iyengar R. Emergent properties of networks of biological 
signaling pathways. Science, 283(5400) :381-7, 1999. 

[23] Heinrich R, Neel B and Rapoport T. Mathematical models of protein 
kinase signal transduction. Mol Cell, 9(5):957-70, 2002. 

[24] Saez-Rodriguez J, Kremling A and Conzelmann H. Modular analysis 
of signal transduction networks. Control Systems Magazine, 24:35-52, 
2004. 

[25] Vera J, Bachmann J, Pfeifer A, Becker V, Hormiga J, Darias N, Timmer 
J, Klingmiiller U and Wolkenhauer O. A systems biology approach to 
analyse amplification in the JAK2-STAT5 signalling pathway. BMC 
Systems Biology, 2:38, 2008. 

[26] Burnham K and Anderson D. Model selection and multimodel infer- 
ence: A practical information-theoretic approach, 2002. 

[27] Peifer M and Timmer J. Parameter estimation in ordinary differen- 
tial equations for biochemical processes using the method of multiple 
shooting. IET Systems Biology, l(2):78-88, 2007. 

[28] Brewer D, Barenco M, Callard R, Hubank M and Stark J. Fitting or- 
dinary differential equations to short time course data. Philos Transact 
A Math Phys Eng Sci, 366:519-544, 2007. 

[29] Moles C, Mendes P and Banga J. Parameter estimation in biochemical 
pathways: a comparison of global optimization methods. Genome Res, 
13(ll):2467-74, 2003. 

[30] Runarsson T and Yao X. Stochastic ranking for constrained evolution- 
ary optimization. Ieee T Evolut Comput, 4:284-294, 2000. 

[31] Ji X and Xu Y. libSRES: a C library for stochastic ranking evolution 
strategy for parameter estimation. Bioinformatics, 22:124-126, 2006. 

[32] Zi Z and Klipp E. SBML-PET: a Systems Biology Markup Language- 
based parameter estimation tool. Bioinformatics, 22(21):2704-5, 2006. 

[33] Rodriguez-Fernandez M, Mendes P and Banga J. A hybrid approach 
for efficient and robust parameter estimation in biochemical pathways. 
Biosystems, 83:248-265, 2006. 



12 



Kirkpatrick S, Gelatt C and Vecchi M. Optimization by simulated 
annealing. Science, 220(4598) :671-680, 1983. 

Brown KS and Sethna JP. Statistical mechanical approaches to models 
with many poorly known parameters. Physical review E, 68:021904, 
2003. 

Brown KS, Hill CC, Calero GA, Myers CR, Lee KH, Sethna JP and 
Cerione RA. The statistical mechanics of complex signaling networks: 
nerve growth factor signaling. Physical biology, 1(3-4): 184-95, 2004. 

Vyshemirsky V and Girolami MA. Bayesian ranking of biochemical 
system models. Bioinformatics, 24(6):833-9, 2008. 

Vyshemirsky V and Girolami M. BioBayes: a software package for 
Bayesian inference in systems biology. Bioinformatics, 24(17):1933-4, 
2008. 

Quach M, Brunei N and d'Alche Buc F. Estimating parameters and 
hidden variables in non-linear state-space models based on odes for 
biological networks inference. Bioinformatics, 23(23) :3209-16, 2007. 

Bard Y. Nonlinear parameter estimation, 1974. 

Venzon D and Moolgavkar S. A method for computing profile- 
likclihood-based confidence intervals. Applied Statistics, 37:87-94, 1988. 

Efron B and Tibshirani R. An introduction to the bootstrap, 1993. 

Hengl S, Kreutz C, Timmer J and Maiwald T. Data-based identifiability 
analysis of non-linear dynamical models. Bioinformatics, 23(19):2612- 
8, 2007. 

Schmidt H, Madsen MF, Dan0 S and Cedersund G. Complexity re- 
duction of biochemical rate expressions. Bioinformatics, 24(6):848-54, 
2008. 

Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, 
Saisana M and Tarantola S. Global sensitivity analysis: The primer. 
John Wiley and Sons, 2008. 

Gutenkunst R, Waterfall J, Casey F, Brown K, Myers C and Sethna J. 
Universally sloppy parameter sensitivities in systems biology models. 
PLoS Comput Biol, 3(10):el89, 2007. 

Wei J. Least squares fitting of an elephant. Chemtech, 128-129, 1975. 

Timmer J and Muller T. Modeling the nonlinear dynamics of cellular 
signal transduction. Int J Bifurcat Chaos, 14:2069-2079, 2004. 

Akaike H. Information theory as an extension of the maximum likeli- 
hood principle. Second International Symposium on Information The- 
ory (Akademiai Kiado, Budapest), 267-281, 1973. 

Akaike H. A new look at the statistical model identification. Automatic 
Control, 19:716-723, 1974. 

Kass R and Raftery A. Bayes factors. Journal of the American Statis- 
tical Association, 90:773-795, 1995. 

Schwarz G. Estimating the dimension of a model. The Annals of Statis- 
tics, 6:461-464, 1978. 

Cox RD and Hinkley DV. Theoretical statistics. Chapman & Hall/CRC, 
New York, 1974. 



13 



[54] Marjoram P, Molitor J, Plagnol V and Tavare S. Markov chain Monte 
Carlo without likelihoods. Proc Natl Acad Sci USA, 100(26): 15324-8, 
2003. 

[55] Sisson SA, Fan Y and Tanaka MM. Sequential Monte Carlo without 
likelihoods. Proc Natl Acad Sci USA, 104(6): 1760-5, 2007. 

[56] Toni T, Welch D, Strelkowa N, Ipsen A and Stumpf MPH. Approxi- 
mate bayesian computation scheme for parameter inference and model 
selection in dynamical systems. J. R. Soc. Interface, 6:187-202, 2009. 

[57] Beaumont MA, Cornuet JM, Marin JM and Robert CP. Adaptive 
approximate bayesian computation. arXiv, stat.CO, 2008. 

[58] Moral PD, Doucet A and Jasra A. An adaptive sequential monte carlo 
method for approximate bayesian computation. 

[59] Darnell JE. STATs and gene regulation. Science, 277(5332): 1630-5, 
1997. 

[60] Horvath CM. STAT proteins and transcriptional responses to extracel- 
lular signals. Trends Biochem Set, 25(10):496-502, 2000. 

[61] Klingmullcr U, Bergelson S, Hsiao JG and Lodish HF. Multiple tyrosine 
residues in the cytosolic domain of the erythropoietin receptor promote 
activation of STAT5. Proc Natl Acad Sci USA, 93(16):8324-8, 1996. 

[62] Kim TK and Maniatis T. Regulation of interferon-gamma-activated 
STAT1 by the ubiquitin-proteasome pathway. Science, 273(5282): 1717- 
9, 1996. 

[63] Koster M and Hauser H. Dynamic redistribution of STAT1 protein 
in IFN signaling visualized by GFP fusion proteins. Eur J Biochem, 
260(l):137-44, 1999. 

[64] Muller TG, Faller D, Timmer J, Swameye I, Sandra O and Klingmiiller 
U. Tests for cycling in a signalling pathway. Journal of the Royal 
Statistical Society Series C, 53(4) :557, 2004. 



14 



